Calculate TPSA
Ertl, Rohde, and Selzer (J. Med. Chem., 43:3714-3717, 2000) published an algorithm for fast molecular polar surface area (PSA). Part of it involves summing up partial surface values based on fragment contributions. Each fragment corresponds to a SMARTS match. The goal of this task is get an idea of how to do a set of SMARTS matches when the data comes in from an external table. In this case it's a data table from TJ O'Donnell's CHORD chemistry extension for PostgreSQL, listed at http://www.gnova.com/book/tpsa.tab and available for use here with permission. Each line in the file contains three tab-separated fields. The first line is the header. The other lines define a fragment contribution. The first field is the partial surface area contribution, for each SMARTS pattern match defined in the second column. The last column is a comment. Note that the first SMARTS definition contains a typo, it should be "N+0;H0;D1;v3" instead of "N0;H0;D1;v3". To compute the topological polar surface area (for purposes of this task) of a given structure, take the sum over all fragment contributions, weighted by the number of times that fragment matches. Implementation Write a function or method named "TPSA" which gets its data from the file "tpsa.tab". The function should take a molecule record as input, and return the TPSA value as a float. Use the function to calculate the TPSA of "CN2C(=O)N©C(=O)C1=C2N=CN1C". The answer should be 61.82, which agrees exactly with Ertl's online TPSA tool but not with PubChem's value of 58.4. Indigo/Python import sys import collections import indigo indigo = indigo.Indigo() # Some place to store the pattern defintions Pattern = collections.namedtuple("Pattern", "subsearch") patterns = [] # Get the patterns from the tpsa.tab file, ignoring the header line for line in open("tpsa.tab").readlines()1:: # Extract the fields value, smarts, comment = line.split("\t") subsearch = indigo.loadSmarts(smarts) # Store for later use patterns.append( Pattern(float(value), subsearch) ) # Helper function to count how many times a substructure matches def count_matches(subsearch, mol): return indigo.countSubstructureMatches(subsearch, mol) def TPSA(mol): "Compute the topological polar surface area of a molecule" return sum(count_matches(pattern.subsearch, mol)*pattern.value for pattern in patterns) # Test it with the reference structure mol = indigo.loadMolecule("CN2C(=O)N©C(=O)C1=C2N=CN1C") print TPSA(mol) OpenBabel/Rubabel require 'rubabel' lines = IO.readlines("tpsa.tab") header = lines.shift @patterns = lines.map {|line| line.chomp.split("\t") } def TPSA(mol) @patterns.inject(0.0) {|s,p| s + p0.to_f * mol.matches(p1, false).size } end puts TPSA( Rubabel"CN2C(=O)N(C)C(=O)C1=C2N=CN1C" ) OpenEye/Python from openeye.oechem import * import collections # Some place to store the pattern defintions Pattern = collections.namedtuple("Pattern", "subsearch") patterns = [] # Get the patterns from the tpsa.tab file, ignoring the header line for line in open("tpsa.tab").readlines()1:: # Extract the fields value, smarts, comment = line.split("\t") # Use the SMARTS to define a subsearch object subsearch = OESubSearch(smarts) # Store for later use patterns.append( Pattern(float(value), subsearch) ) # Helper function to count how many times a substructure matches def count_matches(subsearch, mol): return sum(1 for match in subsearch.Match(mol)) def TPSA(mol): "Compute the topological polar surface area of a molecule" return sum(count_matches(pattern.subsearch, mol)*pattern.value for pattern in patterns) # Test it with the reference structure mol = OEGraphMol() OEParseSmiles(mol, "CN2C(=O)N©C(=O)C1=C2N=CN1C") print TPSA(mol) RDKit/Python from rdkit import Chem import collections # Some place to store the pattern defintions Pattern = collections.namedtuple("Pattern", "subsearch") patterns = [] # Get the patterns from the tpsa.tab file, ignoring the header line for line in open("tpsa.tab").readlines()1:: # Extract the fields value, smarts, comment = line.split("\t") # Use the SMARTS to define a subsearch object subsearch = Chem.MolFromSmarts(smarts) # Store for later use patterns.append( Pattern(float(value), subsearch) ) # Helper function to count how many times a substructure matches def count_matches(subsearch, mol): return len(mol.GetSubstructMatches(subsearch)) def TPSA(mol): "Compute the topological polar surface area of a molecule" return sum(count_matches(pattern.subsearch, mol)*pattern.value for pattern in patterns) # Test it with the reference structure mol = Chem.MolFromSmiles("CN2C(=O)N©C(=O)C1=C2N=CN1C") print TPSA(mol) Cactvs/Tcl set cactvs(aromaticity_model) daylight set eh create CN2C(=O)N(C)C(=O)C1=C2N=CN1C set tpsa 0.0 table loop read tpsa.tab row { lassign $row v smarts set tpsa $tpsa+[match ss -charge 1 -mode distinct $smarts $eh*$v] } puts $tpsa The table reader needs no detailed instructions - it automatically and correctly analyzes the structure of the parameter file. We need to switch the aromaticity model to the decidedly weird Daylight definition to get the requested result. Cactvs by default does not think that exocyclic keto groups are compatible with aromaticity. With its own model, the result is a familiar 58.44 (and that is no coincidence). Cactvs/Python cactvs'aromaticity_model'='daylight' e=Ens('CN2C(=O)N©C(=O)C1=C2N=CN1C') tpsa=0.0 for row in Table.Read('tpsa.tab'): tpsa +=match('ss',row1,e,charge=True,mode='distinct')*row0 print(tpsa) Category:TPSA Category:feature counts Category:OpenEye/Python Category:RDKit/Python Category:Indigo/Python Category:Cactvs/Tcl Category:Cactvs/Python